Why Hypotheses Informed by Observation Are Often Wrong: Results of Randomized Controlled Trials Challenge Chronic Disease Management Strategies Based on Epidemiological Evidence

I

F or researchers and clinicians engaged in the important business of trying to prevent cardiovascular events in patients with type 2 diabetes, 2010 was a year of eyeopening surprises. The revelations began early with the March 2010 release of results from the blood pressure control component of the ACCORD randomized controlled trial (RCT). 2 Initiated in 2001 by the National Heart Lung and Blood Institute (NHLBI), ACCORD was an important landmark clinical trial for patients with type 2 diabetes and risk factors for cardiovascular disease; it represented the first large, experimental investigation of several key treatment guidelines that were based on sound epidemiological evidence but had never been subjected to the more rigorous test of an RCT. 3 ACCORD study subjects had type 2 diabetes, with a hemoglobin A1c level of at least 7.5%, and met cardiovascular risk criteria including (a) aged 40 years or older with cardiovascular disease or (b) aged 55 years or older with atherosclerosis, albuminuria, left ventricular hypertrophy, or at least 2 cardiovascular disease risk factors (obesity, smoking, dyslipidemia, or hypertension). 2 Of the 10,251 ACCORD study subjects, 4,733 met clinical criteria for inclusion in the blood pressure control component of the trial (systolic blood pressure [SBP] between 130 and 180 millimeters mercury [mmHg], taking 3 or fewer antihypertensive medications, and "the equivalent of a 24-hour protein excretion rate of less than 1.0 [grams]"). Patients were randomized to "intensive" therapy targeted to SBP less than 120 mmHg (n = 2,362) or "standard" therapy targeted to SBP less than 140 mmHg (n = 2,371). Patients were treated using "strategies that are currently available in clinical practice" and "drug classes … shown to result in a reduction in cardiovascular events among participants with diabetes." 2 At 1-year follow-up, mean SBPs were 119.3 (95% confidence interval [CI] = 118.9-119.7) mmHg and 133.5 (95% CI = 133. 1-133.8) mmHg in the intensive-and standard-therapy groups, respectively, a difference that was "significant and sustained" throughout the study. However, in a mean 4.7 years of follow-up, the SBP differences did not translate into significant improvements in the primary endpoint outcome, a composite of nonfatal myocardial infarction (MI), nonfatal stroke, or cardiovascular death; annual rates were 1.87% in intensive therapy and 2.09% in standard therapy (hazard ratio [HR] = 0.88, 95% CI = 0.73-1.06, P = 0.20). Additionally, serious adverse events attributable to antihypertensive treatment, including hypotension, bradycardia or arrhythmia, and hyperkalemia, were more common in the intensive-than standard-therapy arm (3.3% vs. 1.3%, respectively, P < 0.001). 2 On the day of the announcement of the ACCORD blood pressure study results on , presenters at an American College of Cardiology scientific session shared startling preliminary findings from a retrospective observational analysis of data from the International Verapamil SR-Trandolapril (INVEST) study of treatment with an angiotensin-converting enzyme (ACE) inhibitor and/or diuretic plus either a calcium channel blocker or beta-blocker for patients with hypertension and coronary artery disease. 4 Conducted by Cooper-DeHoff et al. (2010), the observational analysis assessed the rate of cardiovascular events, including the primary composite outcome of nonfatal MI, nonfatal stroke, or all-cause death, in a subset of INVEST patients who also had diabetes (type not specified, n = 6,400), comparing those who achieved "tight" control (SBP of less than 130 mmHg, n = 2,255), "usual" control (SBP of 130 mmHg to less than 140 mmHg, n=1,970), or "uncontrolled" SBP (140 mmHg or more, n = 2,175). 5 Notably, the "tight" target was strongly supported by treatment guidelines in place at the time, including those of the American Diabetes Association (ADA), the seventh report of the Joint National Committee (JNC-7), and the American Heart Association (AHA). 6,7,8 Yet, during a mean follow-up of approximately 2.6 years per patient, not only did tight control fail to decrease cardiovascular event rates compared with usual control; it was associated with slightly higher all-cause mortality. Although uncontrolled blood pressure was associated with an increase in the primary outcome compared with usual control (19.8% vs. 12.6%, respectively, HR = 1.46, 95% CI = 1.25-1.71, P < 0.001), rates for the usual control and tight control groups did not significantly differ (tight control = 12.7%, HR = 1.11, 95% CI = 0.93-1.32, P = 0.24). in preventing cardiovascular events and may actually cause harm. 11,12 A follow-up analysis of ACCORD's intensive glycemic control component was published in March 2011, about 3 years after termination of the tight glycemic control arm of the trial for patient safety reasons. The extended analysis over a total of 5 years of follow-up supported the previous (2008) finding of an increase in all-cause death without a significant benefit on the primary study outcome (a composite of nonfatal MI, nonfatal stroke or cardiovascular death) for patients randomized to an "intensive" treatment arm (A1c target less than 6.0%, median before termination 6.4%), compared with patients randomized to "standard" treatment (A1c target of 7.0%-7.9%, median before termination 7.5%). 13,14 Specifically, among patients treated with the intensive strategy for a mean of 3.7 years and transitioned to standard care for the remainder of the study (mean 1.2 years), the 5-year primary outcome rate was 2.1%, compared with 2.2% for patients who received standard treatment for all 5 years (P = 0.12), and all-cause mortality rates were 1.5% and 1.3%, respectively (P = 0.02). 13 Perhaps more importantly, the ACCORD investigators still had no explanation for the findings. "We just don't understand why we are seeing this mortality signal," noted the lead investigator in a March 2011 interview, "and it is not for lack of looking. We've looked at severe hypoglycemia and it's not that, and it also doesn't seem to be caused by the rapid fall of [A1c] levels. There is no clear explanation emerging for this observation." 14

Epidemiological Evidence and Refuted Guidelines: ACCORD as a Case in Point
One of ACCORD's most important objectives was to test the "lower-is-better" view of A1c in type 2 diabetes which, at the time the trial was launched in 2001, had never been tested in an RCT but was widely held because of observational studies showing an association between higher A1c and poor outcomes. 1,3,15 Among these was a study by Stratton et al. (2000) that is notable for its high quality. 16 In a retrospective observational analysis of data gathered in the United Kingdom Prospective Diabetes Study (UKPDS), Stratton et al. categorized 3,642 patients with type 2 diabetes by A1c level, measured using an appropriate time-varying method. Cox proportional hazards regressions assessed the rates of negative patient outcomes (e.g., diabetic complications, MI, death) for patients at higher versus lower levels of A1c, controlling for numerous factors including demographics, smoking, LDL-C, triglyceride level, albuminuria, and SBP. Poisson regressions assessed incidence rates of patient outcomes, controlling for demographics and duration of diabetes.
The results of the observational analysis were striking; each 1% reduction in A1c was associated with hazard decreases of 21% for diabetes-related death (95% CI = 15%-27%), 14% for MI (95% CI = 8%-21%), and 37% for microvascular complications (95% CI = 33%-41%). For patients And, in an extended follow-up analysis of the 5-year period following the close of the INVEST study, rates of all-cause mortality were 22.8% for patients with tight control and 21.8% for patients with usual control (HR = 1.15, 95% CI = 1.01-1.32, P = 0.04). 5 There were even more surprises on the same day with the release of the results of the lipid-lowering component of ACCORD, a development that was later listed by the AHA and American Stroke Association as one of the "top 10 advances in cardiovascular research in 2010." 9,10 ACCORD investigators randomized a subset of patients who met clinical eligibility criteria (low-density lipoprotein cholesterol [LDL-C] of 60 to 180 milligrams per deciliter [mg per dL]; high-density lipoprotein cholesterol [HDL-C] less than 55 mg per dL for women and blacks or less than 50 mg per dL for all others; and triglyceride level below 750 mg per dL for patients not already receiving lipid-lowering therapy or below 400 mg per dL for patients receiving lipid-lowering therapy) to treatment with simvastatin + placebo (n = 2,753) or simvastatin + fenofibrate (n = 2,765). In a mean of 4.7 years of follow-up, the annual rates of the primary composite outcome (nonfatal MI, nonfatal stroke, or cardiovascular death) were 2.2% and 2.4% in the fenofibrate and placebo groups, respectively (P = 0.32). Study groups did not significantly differ on any secondary endpoint outcome, including "major coronary disease event" (a composite of nonfatal MI, unstable angina, or cardiovascular death), death from any cause, or congestive heart failure. 10 Noting that "when a study does not support the central hypothesis, it is critical to examine potential reasons for this outcome," ACCORD investigators reported the results of numerous pre-specified subgroup analyses, finding only the possibility of an interaction of treatment with sex (benefit for men and possible harm for women) and "a suggestion of heterogeneity according to baseline lipid levels: patients who had both a triglyceride level in the highest third and an [HDL-C] level in the lowest third … appeared to benefit from fenofibrate, whereas all other patients receiving fenofibrate did not." 10 Although ACCORD investigators noted that their findings were consistent with prevailing treatment guidelines "that recommend treatment for patients with hypertriglyceridemia and low [HDL-C] levels that persist despite statin therapy," the AHA deemed the results a significant advance because they "will be helpful for targeting specific treatments that best reduce CVD risk in people with diabetes." 9

Follow-Up to ACCORD Affirms Earlier Findings
Also during 2010, researchers in both ACCORD and Action in Diabetes and Vascular Disease: Preterax and Diamicron Modified Release Controlled Evaluation (ADVANCE) continued to investigate the underlying causes of their previously reported puzzling findings that intensive glycemic control in patients with type 2 diabetes does not appear to be helpful with an A1c less than 6%, the unadjusted rate of diabetesrelated deaths per 1,000 person-years of follow-up was 5.5, compared with 11.5 for patients whose A1c was within the accepted target range at the time, 7.0%-7.9%. 3,16 Patients with A1c less than 6% experienced better outcomes than patients with A1c of 7.0%-7.9% on several other key measures, including all-cause mortality (11.1 vs. 18.7, respectively), fatal or nonfatal MI (10.1 vs. 16.6), and the development of complications related to diabetes (24.9 vs. 43.6). 16 Although the research team cautioned that study results represented epidemiological associations, not necessarily a causal relationship between antidiabetic treatment to normal A1c and positive outcomes, they nonetheless concluded that "there is no specific target value of [A1c] for which one should aim but … the nearer to normal the [A1c] concentration the better" and that "any reduction in [A1c] is likely to reduce the risk of complications, with the lowest risk being in those with [A1c] values in the normal range (< 6.0%)." 16 Viewed against the backdrop of the exceptionally promising findings of Stratton et al., the ACCORD trial results were understandably disappointing to clinicians and researchers, who were described as "flummoxed" in a media account published after termination of the intensive glycemic control experiment in February 2008. 1 Yet, although the ACCORD trial results were, in a sense, surprising because of the strength of the observational evidence used to develop the study hypothesis, some knowledgeable observers noted that the replacement of treatment protocols based on observational results with new protocols based on stronger randomized evidence is a predictable and oft-repeated pattern in medical science. One observer summed up the ACCORD results by saying that "once again, the most sound of scientific and biologic plausibility has been refuted by a large clinical-outcomes trial." 1 The research literature is replete with such examples of evolving evidence, including research into the benefits of hormone replacement therapy in postmenopausal women, treatment algorithms for chronic kidney disease, and influenza vaccination of persons aged 65 years or older to prevent mortality. 17 Indeed, the blood pressure control guidelines that were supported by so many key organizations, yet contradicted by ACCORD, were originally based on epidemiological analyses. [3][4][5] ADA guidelines noted in 2010 that because these analyses showed "that blood pressure > 115/75 mmHg is associated with increased cardiovascular event rates and mortality in individuals with diabetes," a "target blood pressure goal of < 130/80 mmHg is reasonable if it can be achieved safely." 6 A 2002 ADA guideline, in place at the approximate time of ACCORD's initiation, noted that "there is no threshold value for [blood pressure], and risk continues to decrease well into the normal range." 18

Why Are Informed Hypotheses Refuted So Often?
Replacement of initial epidemiological observation with more rigorous experimental evidence is common, desirable, and inherent to good scientific progress. Yet, the pattern is still somewhat puzzling on the surface. Why is observational evidence seemingly so likely to produce an erroneous conclusion? A look at some of the more common explanations suggests the seriousness of the problem and provides guidance for improvement.

Regression to the Mean (RTM).
Although not a factor in the epidemiological research leading to the guidelines for patients with type 2 diabetes that were ultimately refuted by ACCORD, RTM has plagued many an observational analysis of interventions to improve outcomes in chronic disease care. The cause of RTM is "within-subject variability," that is, random fluctuations in natural phenomena measured repeatedly in any person or other research subject. 19 Because of within-subject variability, "when observing repeated measurements in the same subject, relatively high (or relatively low) observations are likely to be followed by less extreme [values] nearer the subject's true mean." 20 Thus, when a group of study subjects is selected on the basis of abnormally high values on any measure (e.g., elevated A1c or SBP), a decline in the elevated values on average is mathematically expected; the same is true of increases in groups initially selected for abnormally low values (e.g., HDL-C). These changes are expected regardless of any intervention.
For example, Domurat (1999) studied an intensive disease state management (DSM) program (endocrinologist care, case management, routine screenings, and patient education) for patients with diabetes in a managed care organization. 21 The DSM program was targeted to the highest-risk 30% of patients with diabetes who had "multiple hospital, emergency department, or urgent clinic admissions or visits," recent complications of diabetes, comorbidities (e.g., uncontrolled hypertension or "general debility"), "poor understanding of disease self-care," or other risk factors. In addition to better screening rates (e.g., for proteinuria, lipids) and lower hospitalization rates for the DSM patients, Domurat found that among both DSM patients and a comparison group of patients with diabetes who did not participate in the DSM program, patients whose laboratory results were within goal at baseline on average experienced worsened outcomes from baseline to follow-up (e.g., mean A1c changed from 7.0% to 7.5% in DSM), whereas among patients not in goal at baseline, outcomes improved (e.g., mean A1c declined by 1.3 points from 10.7% to 9.4% in DSM and by 2.4 points from 11.1% to 8.7% in usual care). 21 Similar RTM patterns have been noted in other DSM program analyses but were often misinterpreted by study authors as programmatic effects. 19,22 In contrast, in the Medicare Health Support (MHS) Chronic Disease Pilot Program, which randomized Medicare beneficiaries with heart failure or diabetes to a Why Hypotheses Informed by Observation Are Often Wrong:

Results of Randomized Controlled Trials Challenge Chronic Disease Management Strategies Based on Epidemiological Evidence
DSM program or a control (usual care) group, improvements in patient outcomes (hospitalization rates, health care costs) were observed in both study groups. A statistical analysis by MHS evaluators, Cromwell et al. (2008), appropriately assessed whether the declines were greater in the MHS DSM group compared with the control group, finding no significant betweengroup differences. 23 Although previously reported weak observational analyses had calculated enormous (up to 14:1) returnon-investment estimates for DSM programs, the randomized MHS pilot results did not meet a key contractual requirement set by the Centers for Medicare & Medicaid Services, "at least budget neutrality through the pilot's first 12 months," resulting in termination of the project in 2008. 22,23 Notably, Cromwell et al. observed in an early report that "it appears that both disease groups are regressing to the mean with fewer subsequent admissions for the condition that qualified beneficiaries for the program," and that the organizations providing DSM "may have substantially overestimated the success of their intervention in reducing other hospital admissions-at least relative to a randomly matched comparison group also regressing-to-themean." 23 When randomization is not possible, mathematical formulae to adjust observational results for RTM are available, 19,20 but these are seldom used in the published managed care pharmacy literature.

Confounding.
Among the most ubiquitous problems in observational research is confounding, sometimes called spurious association, in which the observed relationship between an independent variable and the outcome measures is partially or entirely attributable to a different factor that is systematically related to both the independent variable and the outcome. Sometimes, the problem is the "healthy adherer effect," in which both adherence to placebo and adherence to beneficial treatment are associated with positive outcomes. 24,25 When observational analyses indicate that adherence to a particular treatment predicts positive outcomes, the results could be attributable to the treatment itself or to other causal factors for the outcome that are related to (but not caused by) treatment adherence (e.g., healthier diet, self-protective behaviors). 26 Sometimes, the measured predictor is a marker for the confounding factor, severity of disease. For example, in an analysis that compared patients with versus without anemia, Nissenson et al. (2005) found that services clearly attributable to anemia (e.g., transfusions) represented only about 5%-11% of the between-group all-cause health care cost difference. 27 The authors suggested 2 possible interpretations: either their classification algorithm had failed to measure the true cost of anemia, or anemia was a marker for advanced stage or severity of the underlying disease that qualified the patient for entry into the study sample, such as chronic kidney disease, human immunodeficiency virus, or congestive heart failure. 27 An interesting recent example of potential confounding because of a marker independent variable was provided in a study by Zoungas et al. (2010), which assessed the possibility that severe hypoglycemia had caused harm in patients assigned to the intensive glucose-lowering arms of ACCORD, ADVANCE, and the Veterans Affairs Diabetes Trial. 12,28 In their analysis of the relationship between severe hypoglycemia and risks of macrovascular or microvascular adverse events in the ADVANCE study sample, Zoungas  , and all-cause death (HR = 2.69, 95% CI = 1.97-3.67), but was also significantly associated with a variety of respiratory, digestive, and skin conditions (P < 0.001). Noting the "range of adverse clinical outcomes" with which severe hypoglycemia was associated, Zongas et al. suggested that "it is possible that severe hypoglycemia contributes to adverse outcomes, but these analyses indicate that hypoglycemia is just as likely to be a marker of vulnerability to such events." 28

Overreliance on Multivariate Analysis.
Although performing multivariate analyses to adjust for confounding variables is important, sole reliance on multivariate statistical analysis is not a recommended method because bias and "residual confounding," that is, confounding not reflected in measured variables, are possible. 29 This problem has several implications. First, both adjusted and unadjusted results should be calculated and presented to enable readers to determine the effect of statistical adjustment on study findings. 16,29 Second and related, a measure of the overall quality of the statistical model should be calculated and reported. 30 Poor quality, such as a linear regression analysis that explains only 1% of the variance (i.e., R 2 = 0.01) or a logistic regression analysis that performs little better than random assignment in accurately predicting group membership (i.e., c-statistic of 0.50-0.60) is a sign that the analytic results may be compromised by unmeasured confounders. Third, interpretation of observational evidence should be cautious, taking into account the possibility of residual confounding. 29 Common problems in statistical analyses performed to adjust for confounding factors in observational research include failure to explain covariate adjustments transparently, 29 use of propensity-score matching without assessing the accuracy of the logistic regression model used in calculating the propensity score to predict treatment selection, 31,32 and/ or matching cohorts based solely on demographics and plan design rather than on relevant clinical factors. 33

Failure to Investigate Specific Associations.
High-quality observational studies investigate the specific nature of the process underlying the association between the predictor variables and study outcomes, 30,34 for example, whether an association of adherence with statins and all-cause health care costs is Why Hypotheses Informed by Observation Are Often Wrong: Results of Randomized Controlled Trials Challenge Chronic Disease Management Strategies Based on Epidemiological Evidence attributable to hospitalizations for MIs and heart failure, versus hospitalizations for accidents or another outcome unlikely to be caused by statin treatment. Measurement of only all-cause utilization, rather than disease-specific utilization, in assessing study outcomes makes interpretation problematic. 35 In a related problem, one also sometimes sees research reports in which an author suggests that one factor might influence another, but actually measures and reports a different factor entirely. For example, in a 2007 study of drug therapy for depression, the authors noted an association between treatment with particular "first-line" drugs and favorable all-cause health care utilization patterns (increased physician visits, reduced hospitalizations, and no significant differences in emergency room [ER] visits). 36 The authors speculated that the favorable side-effect profiles of the drugs might have improved antidepressant adherence, resulting in better outcomes; however, they did not provide any analyses to support that conclusion, such as an analysis of depression-related health care utilization or a "dose-response" analysis comparing outcomes for drugs with better versus worse side effects. Similarly, a cross-sectional analysis by Goldman et al. (2006) measured the associations between (a) copayments and medication compliance and (b) compliance and hospital and ER use; the authors then performed a simulation analysis of the changes in ER and hospital use that might be expected from copayment changes but did not actually analyze the relationship between either copayment (cross-sectionally) or copayment change (longitudinally) and hospital or ER utilization. Nonetheless, the authors concluded that changing copayments according to "therapeutic need" would "reduce hospitalizations and [ER] use." 37 Less common but also problematic is the failure in retrospective analysis of administrative claims to investigate the time sequence in which events occurred, 30 sometimes drawing conclusions that would potentially require the predictor event to occur after the study outcome. For example, in a retrospective observational analysis of patients with type 1 or type 2 diabetes in primary care, Menzin et al. (2010) measured the association between glycemic control, defined as average A1c value (sum of all A1c values obtained for each patient over a follow-up period of 1 to 5 years, divided by the number of tests), and diabetes-related hospitalizations during the same period. 38 Although the hospitalizations could have occurred prior to the A1c tests intended to predict them, the study authors concluded that their results "showed a significant, positive, and graded relationship between 1-point A1c intervals and rates of diabetes-related hospitalizations." 38 Similarly, a frequently cited study by Sokol et al. (2005) assessed associations between medication adherence for several chronic conditions and medical utilization in several categories (disease-related and all-cause medical costs and hospitalizations), concluding that "for some chronic conditions, increased drug utilization can provide a net economic return when driven by improved adherence with guidelines-based therapy." 39 However, because the medication adherence and medical outcomes were measured during the same 12-month period, patterns of adherence or nonadherence could have occurred after the hospitalizations that they were intended to predict.

Selection Effect.
A special category of confounding is selection effect, in which at least 1 factor associated with group selection is also systematically associated with study outcomes. Studies of programs to improve chronic disease care management have been affected considerably by this problem, often coupled with and exacerbated by an RTM effect, for 2 reasons.
First, patients may self-select into care management programs because they are especially motivated to improve their health outcomes; researchers may attribute the outcomes to the DSM program when, in reality, the outcomes may be partially attributable to the additional motivation. For example, in the Asheville Project studies of medication therapy management provided in community pharmacies, pre-intervention versus post-intervention comparisons were used. [40][41][42] A 2008 report of an Asheville Project program for patients with hypertension and/or dyslipidemia noted that study patients "agreed to complete education classes related to cardiovascular risk reduction and to be matched with a participating care manager/coach with whom they would meet on a regular, long-term basis … as frequently as once a month." 42 Thus, the analysis was, in effect, limited to patients sufficiently motivated to make a substantial time commitment to chronic disease care. In the MHS, patients randomized to DSM could choose whether to participate; those participating were, on average, "healthier" and "less costly" at baseline than patients who were randomized to DSM but opted out. 23 Second, DSM programs sometimes target the highest-risk subsets of patients, commonly imposing program selection criteria such as recent hospitalization or emergency room use. 19,21 When patients with extremely high resource utilization are selected for a program, the natural phenomenon of RTM is intensified, often erroneously making program results appear dramatic when a nonequivalent comparison group is used. Linden provides an interesting illustration of this phenomenon in an analysis of the average health care costs for patients with coronary artery disease who were enrolled in a health plan that provided no health management programs, such as DSM. Among those in the top cost quintile in 2001, mean costs dropped precipitously in 2002-by a remarkable $24,000-whereas for all other plan members (bottom 4 quintiles), costs increased by a modest $920 from 2001 to 2002. 19 An observational analysis of a DSM program targeted to the highest-risk quintile would have found greatly improved outcomes for the DSM group compared with the lower-risk non-DSM patients, but the "results" of the analysis in Linden's Why Hypotheses Informed by Observation Are Often Wrong: Results of Randomized Controlled Trials Challenge Chronic Disease Management Strategies Based on Epidemiological Evidence basis of the drugs' ability to reduce asymptomatic ventricular arrhythmias associated with sudden death;" however, "because arrhythmia reduction reflected only indirectly on the outcome of sudden death," the quality of the evidence was low. An RCT conducted after approval showed that the drugs increased the rate of sudden death, and "appropriate attention to the low quality of the evidence would have saved thousands of lives." Similarly, the GRADE team argued that "expert recommendations lagged a decade behind the evidence from well conducted [RCTs] that thrombolytic therapy achieved a reduction in mortality in [MI]." 45 For patients and clinicians, the stakes in the assessments of scientific evidence about treatments for chronic disease are high. As editorialists Montori and Fernández-Balsells (2009) noted in their call for "an evidence-based about-face" on glycemic control in type 2 diabetes: "Although we should not dismiss potentially effective approaches (for example, early tight glycemic control for patients with newly diagnosed diabetes), we require additional research to confirm or refute such approaches before we impose them on patients." 46 Sometimes, the results of observational studies and RCTs are consistent, as were the secondary analysis of INVEST data by Cooper-DeHoff et al. 5 and the ACCORD blood pressure RCT; 2 but sometimes, they are not, and it is not possible to predict RCT outcomes with certainty. In addition to reminding us-again-of the importance of evidence grading systems in policymaking, recent RCT findings from ACCORD, ADVANCE, and rigorous studies of chronic disease management interventions should highlight the importance of humility in our interpretations, especially of epidemiological associations and other observational evidence. example were achieved solely because of the differential effects of RTM on patients selected for high versus low baseline utilization (and without any DSM program at all).

No Reason Identified (Yet): Erroneous Conclusions
Despite High Quality. In assessing observational results, researchers often point to criteria for assessing causality in epidemiologic studies. Although specific criteria vary, the most common and classic (dating back to analyses of the relationship between smoking and lung cancer in the 1960s) are as follows: 43,44 • Consistency of association-a pattern that is consistently observed in multiple studies is more likely to be causal than a pattern documented in isolated studies; • Strength of association-a statistically strong relationship is more likely to be causal than a weak one; • Dose-response-a relationship is more likely to be causal if increasing doses of the predictive factor are associated with an increase in the outcome; • Plausibility-a relationship is more likely to be causal when a plausible biologic explanation for it is known; and, • Temporality-for A to cause B, A must precede B.
Despite the compelling logic of these 5 criteria, it is important to recognize that the Stratton et al. observational analysis of the relationship between A1c and complications of diabetes, which contributed to the ultimately refuted "lower-is-better" paradigm for A1c, met all 5. And, numerous analyses of seemingly every conceivable explanation for the negative results of "lower-is-better" in ACCORD have as of yet been unable to explain the findings. A March 2010 status report from the NHLBI noted that ACCORD researchers would "continue to analyze the ACCORD data to try to understand why these 'intensive' interventions did not reduce the rates of cardiovascular outcomes as hypothesized," examining participant characteristics and drug effects, potentially to "generate ideas for future studies." 3 Thus, one of the many important lessons to be learned from ACCORD is methodological: even high-quality epidemiological analyses may lead to conclusions about causality that ultimately prove to be erroneous, resulting in progression of knowledge.

Observation: It's (Only) a Start
In a 2008 commentary on the need to assess quality of evidence in making treatment recommendations and decisions, developers of the Grading of Recommendations Assessment, Development and Evaluation (GRADE) system for assessing evidence quality in treatment guideline creation reminded us that poor-quality evidence translated into weakly supported guidelines has at times led to suboptimal patient care and even to patient harm. 45 Among the examples cited by the GRADE team was U.S. Food and Drug Administration (FDA) approval of the antiarrhythmic agents encainide and flecainide "on the